lost the format -madeinchina(强力万能胶) @卡尔加里华人论坛:卡尔加里枫下论坛 The Rolia Forum of Calgary

工作学习 / 专业技术讨论 / 请教一个SQL QUERY问题 -madeinchina(强力万能胶); 2007-8-3 {283} (#3848058@0)
A table in SQL database has hundreds of columns. It is also a big fact table having millions of rows. If all columns have null value, I need to delete this row from this table. Is there any way to do that? It can be a SQL statement or SSIS task but not a external program.

Thanks.

make table query, add condition to column: <> "" -7y7(醉里吴音); 2007-8-3 (#3848147@0)

Make table query only in access tho. from what i understand is that he is complaining the table is too big and he is not going to type the column name and concatenate them together as the condition. -sunday8(sunday8); 2007-8-3 {115} (#3848183@0)
there is a way to do it. by auto list all the column name and contruct a dynamic T-SQL. and do what you need to do.

可以动态生成query。 -bdbs(不多不少); 2007-8-3 {196} (#3848202@0)
DECLARE @sql VARCHAR(5000)

SET @sql = 'DELETE tableName WHERE 1 = 1'

SELECT @sql = @sql + ' AND ' + name + ' IS NULL' from syscolumns where id = OBJECT_ID('tableName')

EXEC (@sql)

高手. -findinghouse(不写错别字。8); 2007-8-3 (#3848249@0)

first thought -madeinchina(强力万能胶); 2007-8-3 {610} (#3848285@0)
I forgot to mention this table has a primary key which is not null for sure. I don't want to do something like column1 is not null and column2 is not null .... columnN is not null. I think i can do this: do an unpivot by the primary key to create a virtual table having two columns (id, value), create a temp table. create table #temp (id) select into #temp(id) select id from unpivot_table group by column_id having sum(isnull(column_value,0))=0 if this is OK, i will delete any rows having the id in #temp. I am not familiar with unpivot so I am going over it. Will update this post soon.

lost the format -madeinchina(强力万能胶); 2007-8-3 {615} (#3848296@0)
I forgot to mention this table has a primary key which is not null for sure. I don't want to do something like column1 is not null and column2 is not null .... columnN is not null.

I think i can do this:

do an unpivot by the primary key to create a virtual table having two columns (id, value), create a temp table.

create table #temp (id)
select into #temp(id)
select id
from unpivot_table
group by column_id having sum(isnull(column_value,0))=0

if this is OK, i will delete any rows having the id in #temp.

I am not familiar with unpivot so I am going over it. Will update this post soon.

如果仅仅是为了有non-nullable的field，稍微修正一下就可以了。 -bdbs(不多不少); 2007-8-3 {216} (#3848320@0)
DECLARE @sql VARCHAR(5000)

SET @sql = 'DELETE tableName WHERE 1 = 1'

SELECT @sql = @sql + ' AND ' + name + ' IS NULL' FROM syscolumns WHERE id = OBJECT_ID('tableName') AND IsNullable = 1

EXEC (@sql)

I mean this big table has some rows having all columns except its key column. I need to delete these rows. -madeinchina(强力万能胶); 2007-8-3 (#3848345@0)

I mean this big table has some rows having all columns value NULL except its key column. I need to delete these rows. -madeinchina(强力万能胶); 2007-8-3 (#3848359@0)

#3848320 难道不能解决你的这个问题么？ -bdbs(不多不少); 2007-8-3 (#3848386@0)

或者? -guestagain(guest again); 2007-8-3 {225} (#3848531@0)
DECLARE @sql VARCHAR(5000)

SET @sql = 'DELETE tableName WHERE 1 = 1'

SELECT @sql = @sql + ' AND ' + name + ' IS NULL' FROM syscolumns WHERE id = OBJECT_ID('tableName') AND name <> key_column_name

EXEC (@sql)

那样的话，不如这样 -bdbs(不多不少); 2007-8-3 {312} (#3848556@0)
DECLARE @sql VARCHAR(5000)

SET @sql = 'DELETE tableName WHERE 1 = 1'

SELECT @sql = @sql + ' AND ' + name + ' IS NULL'
FROM syscolumns
WHERE id = OBJECT_ID('tableName')
AND name NOT IN (SELECT COLUMN_NAME FROM INFORMATION_SCHEMA.KEY_COLUMN_USAGE WHERE TABLE_NAME = 'tableName')

EXEC (@sql)

呵呵，你确定他所谓的key column是你认为的那种key column？ -guestagain(guest again); 2007-8-3 (#3848596@0)

不确定了。：（ -bdbs(不多不少); 2007-8-3 (#3848772@0)

You copy my idea too. Just kidding. -deep_blue(BLUE); 2007-8-4 (#3849991@0)

"this big table has some rows having all columns except its key column" －这怎么可能？key column怎么可能是nullable的？你用中文表述吧。 -bdbs(不多不少); 2007-8-3 (#3848382@0)

2nd thought, becasue I found... -madeinchina(强力万能胶); 2007-8-3 {1051} (#3848351@0)
本文发表在 rolia.net 枫下论坛In my distinct select, record 6 doesn't show. Hmmmmmmmmm.... That means anybody doesn't show up in the party will be...

CREATE TABLE #pvt (VendorID int, Emp1 int, Emp2 int,
Emp3 int, Emp4 int, Emp5 int)
GO
INSERT INTO #pvt VALUES (1,4,3,5,4,4)
INSERT INTO #pvt VALUES (2,4,1,5,5,5)
INSERT INTO #pvt VALUES (3,null,null,null,null,4)
INSERT INTO #pvt VALUES (4,4,2,null,null,4)
INSERT INTO #pvt VALUES (5,5,1,5,5,null)
INSERT INTO #pvt VALUES (6,null,null,null,null,null)
INSERT INTO #pvt VALUES (7,null,0,1,2,4)
GO

select * from #pvt

--Unpivot the table.
SELECT VendorID, Employee, Orders
FROM
(SELECT VendorID, Emp1, Emp2, Emp3, Emp4, Emp5
FROM #pvt) p
UNPIVOT
(Orders FOR Employee IN
(Emp1, Emp2, Emp3, Emp4, Emp5)
)AS unpvt
GO

select distinct VendorID
from
(SELECT VendorID, Emp1, Emp2, Emp3, Emp4, Emp5
FROM #pvt) p
UNPIVOT
(Orders FOR Employee IN
(Emp1, Emp2, Emp3, Emp4, Emp5)
)AS unpvt

select vendorid
from #pvt
group by vendorid
having sum(Emp1)=8

drop table #pvt更多精彩文章及讨论，请光临枫下论坛 rolia.net

#3848320 不行么？想那么复杂干吗？脑筋急转弯？ -guestagain(guest again); 2007-8-3 (#3848515@0)

You can also use powerful INFORMATION SCHEMA view. -deep_blue(BLUE); 2007-8-3 {258} (#3848436@0)
DECLARE @SQL VARCHAR(5000)

SET @SQL = 'DELETE FROM YOUR_TABLE_NAME WHERE 1=1 '

SELECT @SQL = @SQL + ' AND ' + COLUMN_NAME + ' IS NULL ' from INFORMATION_SCHEMA.COLUMNS Where IS_NULLABLE = 'YES' and TABLE_NAME = YOUR_TABLE_NAME

EXEC (@SQL)

你这样抄袭我的CODE -bdbs(不多不少); 2007-8-3 {40} (#3848510@0)
会让人以为你是俺的马甲呢。呵呵，开玩笑。

Here is the sample -madeinchina(强力万能胶); 2007-8-3 {706} (#3848695@0)
Following is the sample table. In this table, VendorId is primary key and all other columns are nullable. So, if I insert rows with following statements, I want to delete row 6 becasue it has all columns having null value. This sample table has 6 colums. The one I have does have about 150 columns.

CREATE TABLE #pvt (VendorID int, Emp1 int, Emp2 int,
Emp3 int, Emp4 int, Emp5 int)
GO
INSERT INTO #pvt VALUES (1,4,3,5,4,4)
INSERT INTO #pvt VALUES (2,4,1,5,5,5)
INSERT INTO #pvt VALUES (3,null,null,null,null,4)
INSERT INTO #pvt VALUES (4,4,2,null,null,4)
INSERT INTO #pvt VALUES (5,5,1,5,5,null)
INSERT INTO #pvt VALUES (6,null,null,null,null,null)
INSERT INTO #pvt VALUES (7,null,0,1,2,4)
GO

In my understanding, when you create table, you don’t actual neither set VendorID as PK or specify it as non-nullable, you should explicitly exclude it in query: -deep_blue(BLUE); 2007-8-4 {185} (#3849997@0)
SELECT @SQL = @SQL + ' AND ' + COLUMN_NAME + ' IS NULL ' from INFORMATION_SCHEMA.COLUMNS Where IS_NULLABLE = 'YES' and TABLE_NAME = YOUR_TABLE_NAME AND COLUMN_NAME <>’VendorID’

bdbs(不多不少), i tried your query -madeinchina(强力万能胶); 2007-8-3 {194} (#3848729@0)
it generated a statement like:

delete from table_name
where column1 is null and column2 is null and ...and...

it worked, for sure. But the performance..., I need to do a test.

Thanks.
数据加载中 -madeinchina(强力万能胶); 2007-8-3 {454} (#3848842@0)
This table is a financing statement table storing all data collected from the whole world. It has 147 columns. One is primary key column, others are float type, nullable. I am loading all data in through several SSIS packages. It will take about 3 hours (at least) to update the whole database.

I will post the query performance result tomorrow. The query is created by bdbs(不多不少).

Thanks, everyone, especially to bdbs and have a good weekend.

And the performance is ...? -bdbs(不多不少); 2007-8-8 (#3856425@0)

Finally, I got the data loaded in database -madeinchina(强力万能胶); 2007-8-8 {408} (#3857787@0)
On a desktop with dual-core 3.4G, 4G memory, Windows XP2, SQL 2005 with SP2. A table having 144 columns, 596000 rows of records, running this query

DECLARE @sql VARCHAR(max)
SET @sql = 'select count(*) from cf WHERE 1 = 1'
SELECT @sql = @sql + ' AND ' + name + ' IS NULL' FROM syscolumns WHERE id = OBJECT_ID('cf') AND IsNullable = 1
EXEC (@sql)

It took 15 seconds to finish. Row count is 0.

ROW count = 0? Does it mean you deleted nothing? -deep_blue(BLUE); 2007-8-9 (#3858376@0)

LZ 改用select count(*) 了，说明未发现脏数据。 -bdbs(不多不少); 2007-8-9 (#3858741@0)

It is a very trick question to prove the correctness of a query. -deep_blue(BLUE); 2007-8-9 {125} (#3859127@0)
If you run something like:

DELETE FROM Your_Table WHERE 1=2
SELECT COUNT(*) FROM Your_Table WHERE 1=2

What do you get?

0。0。不懂你问这么简单的问题干什么。也不明白什么地方trick了。 -bdbs(不多不少); 2007-8-9 (#3859160@0)

The point is you cannot use deleting condition to prove you deleted correct rows. -deep_blue(BLUE); 2007-8-9 (#3859178@0)

恕我愚钝，还是不懂。 -bdbs(不多不少); 2007-8-9 (#3859253@0)

No matter what your deleting condition is, your SELECT COUNT(*) FROM TABLE_NAME deleting condition Always return 0. How can you depended it find out your deleting condition is right!? -deep_blue(BLUE); 2007-8-9 (#3859590@0)

来来回回看了好几遍，我想你是没看明白前面的贴子吧。不然就是我仍旧#3859253 -bdbs(不多不少); 2007-8-9 (#3859663@0)

you did not take your primary key out from the quary. -winstonwx(绿芒果); 2007-8-9 {363} (#3858490@0)
The following one will be better and even including empty strings:

declare @sql varchar(max)
set @sql='select * from cf where 1=1 '

select @sql=@sql + ' and (' + NAME + ' IS NULL OR '+ NAME +'='''')'
FROM SYSCOLUMNS
WHERE ID=OBJECT_ID('cf') AND
NAME != 'ID' ---Identity Columns

the above quary will selsect the empty rows.

IsNullable = 1 is enough to eliminate the primary key -bdbs(不多不少); 2007-8-9 (#3858739@0)

you are right. -winstonwx(绿芒果); 2007-8-9 (#3859418@0)

That's why you got 0 rows. -winstonwx(绿芒果); 2007-8-9 (#3858494@0)

这种脏数据应该在加载时就过滤掉。 -newkid(newkid); 2007-8-9 (#3858569@0)

我想LZ就是在做这项工作，在加载数据前对数据进行预处理。 -bdbs(不多不少); 2007-8-9 (#3858744@0)

It is also a big fact table having millions of rows...这不像是staging数据呀 -newkid(newkid); 2007-8-9 (#3858761@0)

The reason why we do this -madeinchina(强力万能胶); 2007-8-9 {268} (#3858845@0)
In order to improve performance, we denormalize lots of related tables into a single fact table by pivoting. It is very hard to tell the combination will have a row with all nulls in the middle of the process. So, we need to do the data cleaning at the end of the ETL.

When you insert an empty row, is it still in the middle of the process? If yes, then this row should be inserted to the staging table and deleted when ETL is done; if no, then you have a choice not to insert this row -newkid(newkid); 2007-8-9 (#3858865@0)

Yes. You will have one row from here, one row from there. When you merge these rows into single row, you don't know this row will have all null values beforehand. -madeinchina(强力万能胶); 2007-8-9 (#3858901@0)

Then use a staging table. Before you insert into the target table, run your dynamic SQL to delete this row from the staging table. Since intermediate data set is usually not big, performance is not a concern. -newkid(newkid); 2007-8-9 (#3858960@0)

@Calgary

lost the format

Replies, comments and Discussions:

More Topics